6 O H ! |i ig , ij 'i ii i. » » g o - 

PATENT APPLICATION 
ATTY REF NO. 20480.004 

The invention includes a polypeptide comprising an amino acid sequence having 

sequence identity to SEQ ID NO: 146. The invention includes a fragment of a 

polypeptide comprising SEQ ID NO: 146. The invention includes a diagnostic kit 

comprising a polypeptide comprising SEQ ID NO: 146, or a fragment thereof. The 

invention includes a diagnostic kit comprising a polynucleotide sequence encoding SEQ 

ID NO: 146, or a fragment thereof. The invention includes an immunogenic composition 

comprising a polypeptide comprising SEQ ID NO: 146, or a fragment thereof. The 

invention includes an antibody which recognizes a polypeptide comprising SEQ ID NO: 

146, or a fragment thereof. 

SEQ ID NO: 146 also contains an open reading frame comprising SEQ ID NO: 

147. The invention includes a polypeptide comprising SEQ TD NO: 147. SEQ ID NO: 
147 is set forth below. 

SEQ ID NO: 147 

MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLP 

FYSNVTGFHTINHTFGWVffFKDGIWAATEKSNVVRGWVFGSTMNNKSQSVIirNNSTN 

VVIRACNFELCDNPFFAVSKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHL 

REFVFKNKDGFLYVYKGYQPIDVVRDLPSGFNTLKPIFKLPLGINITNFRAILTAFSPAQDI 

WGTSAAAWVGYLICPTTFMI.KYDRNGTITDAVDCSQNPLAELKCSVKSFETDKGrYQTS 

NFRVVPSGDVVRFPNTTNLCPFGEVFNATKFPSVYAWERKKISNCVADYSVLYNSTFFST 

FKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCVL 

AWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLND 

YGFYTTTGTGYQPYRVWLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTP 

SSKRFQPFQQFGRDVSDFTDSVRDPKTSEELDISPCAFGGVSVITPGTNASSEVAVLYQDV 

NCTDVSTAfflADQLTPAWRT^STGNNVFQTQAGCI.TGARHVDTSYECDIPIGAGICASYH 

TVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVSMAKTSVDCNMY 

ICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYKTPTLKYFGGF 

NTSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGLTVL 

PPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQtPFAMQMAYRFNGIGVTQNVLYEN 

QKQIANQFNK AISQIQE S LTTTST ALG KLQD V VNQN AQ ALNTL VKQLS SNFG AIS S VLNDI 

LSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRV 

DFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICHEGKAYFPREGVFVFNG 

TSWITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVYDPLQPELDSFKEELDKYFKNHTS 

PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFI 

AGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLIIYT 

The invention includes a polypeptide comprising an amino acid sequence having 
sequence identity to SEQ ID NO: 147. The invention includes a fragment of a 
polypeptide comprising SEQ ID NO: 147. The invention includes a diagnostic kit 
comprising a polypeptide comprising SEQ ID NO: 147, or a fragment thereof. The 
invention includes a diagnostic kit comprising a polynucleotide sequence encoding SEQ 
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ID NO: 147, or a fragment thereof. The invention includes an immunogenic composition 
comprising a polypeptide comprising SEQ ID NO: 147, or a fragment thereof. The 
invention includes an antibody which recognizes a polypeptide comprising SEQ ID NO: 
147, or a fragment thereof. SEQ ID NO: 147 demonstrates functional homology to a 
coronavirus spike protein. 

Predicted transmembrane regions of SEQ ID NO: 147 are identified below. 
Predicted Transmembrane helices ol SEQ ID NO: 147 
The sequence positions in brackets denominate the core region. 
Only scores above 500 are considered significant. 
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SEQ ID NO: 147, the spike protein, is a surface exposed polypeptide. 
Recombinant expression of a protein can be hindered by hyrophobic transmembrane 
regions. Accordingly, the invention includes a polypeptide comprising SEQ ID NO: 147 
wherein one or more of the hydrophobic regions identified above is removed. The 
invention further includes a polynucleotide encoding such a polypeptide. The invention 
includes recombinantly expressing the protein in a host cell. 

Further characterization of SEQ ID NO: 147 is set forth below. 
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PSORT — Prediction of Protein Localization Sites 

version 6.4 (WWW) 

MYSEQ 1255 Residues 

Species classification: 4 

*** Reasoning Step: 1 

Preliminary Calculation of ALOM (threshold: 0.5) 
count : 2 

Position of the most N-terminal TMS : 496 at i=2 
MTOP : membrane topology (Hartmann et al . ) 

I (middle): 503 Charge diffirence (C-N) : 1.0 
McG: Examining signal sequence (McGeoch) 

Length of UR : 13 

Peak Value of UR: 3.2 8 

Net Charge of CR: 0 

Discriminant Score: 8.66 
GvH: Examining signal sequence (von Heijne) 

Signal Score (-3.5): 5.94 

Possible cleavage site: 13 
>>> Seems to have a cleavable K-term signal seq. 
Amino Acid Composition of Predicted Mature Form: 

■ calculated from 14 
ALOM new cnt : 1 ** thrshld changed to -2 
Cleavable signal was detected in ALOM? : 0B 
ALOM: finding transmembrane regions (Klein et al.) 

count: 1 value: -12.26 threshold: -2.0 

INTEGRAL Likelihood =-12.26 Transmembrane 1202 -1218 (1194 - 

1228) 

PERIPHERAL Likelihood = 0.16 

modified ALOM score: 2.55 
>>> Seems to be a Type la membrane protein 

The cytoplasmic tail is from 1219 to 1255 (37 Residues) 
Rule : vesicular pathway 
Rule: vesicular pathway 
Rule : vesicular pathway 
(14) or uncleavable? 

Gavel: Examining the boundary of mitochondrial targeting seq. 
motif at: 14 

Uncleavable? Ipos set to: 24 
Discrimination of mitochondrial target seq. : 

positive ( 2.18) 
Rule: vesicular pathway 
Rule: vesicular pathway 
Rule: vesicular pathway 

*** Reasoning Step: 2 

KDEL Count : 0 

Checking apolar signal for intramitochondrial sorting 

(Gavel position 24) from: 1 to: 10 Score: 8.0 
SKL motif (signal for peroxisomal protein) : 

pos: 964(1255), count: 1 SRL 

SKL score (peroxisome): 0.1 
Amino Acid Composition Tendency for Peroxisome: 1.37 

AAC not from the N-term. , score modified 
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Peroxisomal proteins? Status: notclr 

AAC score (peroxisome): 0.079 
Amino Acid Composition tendency for lysosomal proteins 

score: 0.39 Status: notclr 
GY motif in the tail of typela? (lysosomal) 
Checking the amount of Basic Residues (nucleus) 
Checking the 4 residue pattern for Nuclear Targeting 
Checking the 7 residue pattern for Nuclear Targeting 
Checking the Robbins & Dingwall consensus (nucleus) 
Checking the RNA binding motif (nucleus or cytoplasm) 
Nuclear Signal Status: negative ( 0.00) 
Type la is favored for plasma memb. proteins 
Checking the NPXY motif. . 
Checking the YXRF motif.. 
Checking N myristoylation . . 

Final Results 

plasma membrane Certainty= 0 . 460 (Af f irmative) < suco 

microbody (peroxisome) Certainty= 0 . 171 (Affirmative) < suco 

endoplasmic reticulum (membrane) Certainty= 0 . 100 (Affirmative) < 

endoplasmic reticulum (lumen) Certainty= 0 . 100 (Affirmative) < suco 



SEQ ID NO: 147 appears to have a N-terminus signaling region, followed by a 
surface exposed region, followed by a transmembrane region followed by a C-terminus 
cytoplasmic domain region. Accordingly, the invention includes an immunogenic, 
surface exposed fragment of SEQ ID NO: 147. Preferably, said fragment comprises an 
amino acid sequence which does not include the last 50 amino acids of the C-terminus of 
SEQ ID NO: 147. Preferably, said fragment comprises an amino acid sequence which 
does not include the last 70 amino acids of the C-terminus of SEQ ID NO: 147. 
Preferably, said fragment does not include a transdomain region of SEQ ID NO: 147. 
Preferably, said fragment does not include a C-tcrminus cytoplasmic domain of SEQ ID 
NO: 147. Preferably, said fragment does not include a N-terminus signal sequence. 
Preferably, said fragment does not include amino acids 1 - 10 of the N-terminus of SEQ 
ID NO: 147. Preferably, said fragment does not include amino acids 1-14 of the N- 
terminus of SEQ ID NO: 147. 

The spike protein of coronaviruses may be cleaved into two separate chains into 
SI and S2. The chains may remain associated together to form and form a dimer or a 
trimer. Accordingly, the invention includes a polypeptide comprising SEQ ID NO: 147 
wherein said polypeptide has been cleaved into SI and S2 domains. The invention 
further includes a polypeptide comprising SEQ ID NO: 147 wherein amino acids 1-10, 
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preferably amino acids 1 - 14 of the N-terminus are removed and further wherein SEQ 

ID NO: 147 is cleaved into SI and S2 domains. Preferably the polypeptide is in the form 

of a trimer. 

Predicted N-glycosylation sites of SEQ ID NO: 147 are identified below: 



Prediction of N-glycosylation sites of SEQ ID NO: 147 
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Accordingly, the invention includes a polypeptide comprising a fragment of SEQ 
ID NO: 147 wherein said fragment comprises one or more of the glycosylation sites 
identified above. The invention further includes a polynucleotide encoding one or more 
of the fragments identified above. This glycosylation site can be covalently attached to a 
saccharide. Accordingly, the invention includes a polypeptide comprising a fragment of 
SEQ ID NO: 147 wherein said fragment comprises one or more of the glycosylation sites 
identified above and wherein said polypeptide is glycosylated at one or more of the sites 
identified above. 

Predicted O-glycosylation sites are identified below: 

Prediction of O-glycosylation sites 

Name Residue No. Potent 

SEQID 147 Thr 698 0.8922 

SEQID 147 Thr 706 0.9598 

SEQID 147 Thr 922 0.9141 0.7338 

SEQID 147 Ser 36 0.8906 0.7264 

SEQID 147 Ser 703 0.8412 0.7676 
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The invention includes a polypeptide comprising a fragment of SEQ ID NO: 147 
wherein said fragment comprises one or more of the o-glycosylation sites identified 
above. The invention further includes a polynucleotide encoding one or more of the 
fragments identified above. The invention further includes a polypeptide comprising a 
fragment of SEQ TD NO: 147 wherein said fragment comprises one or more of the 0- 
glycosylation sites identified above and further wherein the polypeptide is covalently 
bonded to a saccharide at one or more of the included glycosylation sites. 

The invention further includes a polypeptide comprising a fragment of SEQ ID 
NO: 147 wherein said fragment comprises one or more of the N-glycosylation sites 
identified above and further wherein said fragment comprises one or more of the O- 
glycosylation sites identified above. 

The invention includes a polypeptide comprising a fragment of SEQ ID NO: 147 
wherein said fragment does not include one or more of the glycosylation sites identified 
above. The invention also includes a polynucleotide encoding such a polypeptide. 

Predicted phosphorylation sites of SEQ ID NO: 147 are Ser-346, Tyr-195, and 
Tyr-723. Accordingly, the invention comprises a polypeptide comprising a fragment of 
SEQ ID NO: 147 wherein said fragment comprises at least ten amino acid residues and 
wherein said fragment comprises one or more of the amino acids selected from the group 
consisting of Ser-346, Tyr-195, and Tyr-723. In one embodiment, one or more of the 
amino acids selected from the group consisting of Ser-346, Tyr-195, and Tyr-723 are 
phosphorylated. 

Predicted coiled coils of SEQ ID NO: 147 are identified below: 
Coiled coil Prediction: 
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Accordingly, the invention comprises a polypeptide sequence comprising a 
fragment of SEQ ID NO: 147 wherein said fragment includes a coiled region of SEQ ID 
NO: 147. The invention comprises a polypeptide sequence comprising a fragment of 
SEQ ID NO: 147, wherein said fragment does not include a coiled region of SEQ ID NO: 
147. 

The ORFla and ORE lb sequences of coronaviruses are typically translated as a 
single ORElab polyprotein. Slippage of the ribosome during translation generates an a-1 
frameshift. One region of such slippage is illustrated below: 



gggttttacacttagaaacacagtctgtaccgtctgcggaatgtggaaaggttatggctgtagttgtga 
+ 1 GFTLRNTVCTVCGMWKGYGCSCD 
+3 GFYT - KHSLYRLiRNVERLiWLi - L 

ccaactccgcgaacccttgatgcagtctgcggatgcatcaacgtttttaaacgggtttgcggtgtaagt 
+ 1 QLREPLMQSADASTFLNGFAV-V 
+ 3 PTPRTLDAVCGC INVFKRVCGVS 

gcagcccgtcttacaccgtgcggcacaggcactagtactg 

+ 1 QPVLHRAAQALVL 
+ 3 AARLTPCGTGTST 

which would generate the following translational slippage: 



ccaactccgcgaacccttgatgcagtctgcggatgcatcaacgtttttaaacgggtttgcggtgtaagt 

QLREPLMQSADASTFLNRVCGVS 

Accordingly, the invention includes a polypeptide comprising SEQ ID NO: 148. 
SEQ ID NO: 148 is set forth below. 
SEQ ID NO: 148 

MESLVLGVNEKTHVQLSLPVLQVRDVLVRGFGDSVEEALSEAREHLKNGTCGLVELEKGVI.PQLEQPYV 
FIKRSDALSTNHGHKWELVAEMDGIQYGRSGITLGVLVPHVGETPIAYRNVLLRKNGNKGAGGHSYGI 
DLKSYDLGDELGTDPIEDYEQNWNTKHGSGALRELTRELNGGAVTRYVDNNFCGPDGYPLDCIKDFLAR 
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